Self-supervised Low Light Image Enhancement and Denoising 


Yu Zhang Xiaoguang Di 


Bin Zhang 


Qingyan Li Shiyu Yan 


Chunhui Wang 
Department of Electronic Science and technology, Harbin Institute of Technology 
zhangyuhit2@hit.edu.cn 


Abstract 


This paper proposes a self-supervised low light image 
enhancement method based on deep learning, which can 
improve the image contrast and reduce noise at the same 
time to avoid the blur caused by pre-/post-denoising. The 
method contains two deep sub-networks, an Image Contrast 
Enhancement Network (ICE-Net) and a Re-Enhancement 
and Denoising Network (RED-Net). The ICE-Net takes the 
low light image as input and produces a contrast enhanced 
image. The RED-Net takes the result of ICE-Net and the low 
light image as input, and can re-enhance the low light im- 
age and denoise at the same time. Both of the networks can 
be trained with low light images only, which is achieved by a 
Maximum Entropy based Retinex (ME-Retinex) model and 
an assumption that noises are independently distributed. In 
the ME-Retinex model, a new constraint on the reflectance 
image is introduced that the maximum channel of the re- 
flectance image conforms to the maximum channel of the 
low light image and its entropy should be the largest, which 
converts the decomposition of reflectance and illumination 
in Retinex model to a non-ill-conditioned problem and al- 
lows the ICE-Net to be trained with a self-supervised way. 
The loss functions of RED-Net are carefully formulated to 
separate the noises and details during training, and they 
are based on the idea that, if noises are independently dis- 
tributed, after the processing of smoothing filters (e.g. mean 
filter), the gradient of the noise part should be smaller than 
the gradient of the detail part. It can be proved qualitatively 
and quantitatively through experiments that the proposed 
method is efficient. 


1. Introduction 


Images captured in low light conditions always suffer 
from low contrast, low brightness and serious noise, and so 
on. Low light image enhancement method is used to solve 
those problems before high-level computer vision tasks, but 
there are few methods to deal with these problems well at 
the same time. Recently, various deep learning based al- 
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Figure 1. Visual comparison with supervised low light image en- 
hancement method KinD [42]. The proposed method can well 
improve the contrast and brightness of the images and at same 
time reduce noise, and sharpen the edges. (Best viewed on high- 
resolution displays with zoom-in) 


gorithms have achieved surprising results in some image 
processing and computer vision tasks, such as object de- 
tection [24], [33], [32],[12], image segmentation [12], [25], 
[2], etc. One most important reason for the rapid develop- 
ment of deep learning in these tasks is that we can obtain 
a large number of data sets with clear and unambiguous la- 
bels. In these tasks, although the construction of the data 
set requires some cost, it is still acceptable, also on the In- 
ternet, a large number of open-source data sets can be found 
for these tasks to support the training of the network. 


However, in low-level image processing tasks such as 
low light image enhancement, image dehazing, and image 
restoration, etc., it is difficult to obtain a large number of 
true input/label image pairs. As for low light image en- 
hancement task, in the previous work, some supervised so- 
lutions such as synthesizing low light images [26], using 
images with different exposure time [4], and so on, have 
achieved good visual effects, especially in noise reducing. 
Even though, there are still two problems with those meth- 


ods. One is how to ensure that the pre-trained network can 
be used for images collected from different devices, dif- 
ferent scenes, and different lighting conditions rather than 
building new training data set(e.g. [42] failed to remove 
noises in the background in Fig. 1). The other is how to 
determine whether the normal light image used for super- 
vision is the best, there can be lots of normal light images 
for one low light image. Usually, the data sets with paired 
low/high light images are built with artificial adjustment, 
which will cost lots of time and energy, and also we cannot 
make sure the normal light images can complete the training 
task very well. 


In this paper, to overcome the problems in previous 
works, we proposed a self-supervised low light image en- 
hancement framework to realize image contrast enhance- 
ment and denoising at the same time. Similar to pre- 
vious works[26], two networks are used to achieve con- 
trast enhancement and denoising, respectively. However, 
different from the previous work, the two networks, i.e. 
Image Contrast Enhancement Network (ICE-Net) and the 
Re-Enhancement and Denoising Network (RED-Net), are 
trained self-supervised. And the RED-Net is designed to re- 
duce noises through re-enhancing the contrast of low light 
images, which can reduce the loss of information caused by 
pre-/post-processing with existing denoising methods. 

For the training of ICE-Net, a Maximum Entropy based 
Retinex (ME-Retinex) model was proposed. Different from 
previous Retinex models which only assume that illumina- 
tion is smooth, in the ME-Retinex model, we introduce a 
new constraint on the reflectance image that the maximum 
channel of the reflectance image conforms to the maximum 
channel of the low light image and its entropy should be 
largest. With a constraint on reflectance, we can directly 
control the image enhancement level and convert the ill- 
conditioned decomposition of reflectance and illumination 
into a non-ill-posed problem. 


For the training of RED-net, we adopt the assumption 
that the noises conform to the Poisson distribution and in 
different pixels they are independent. With this assumption, 
the gradients of noises should be smaller than the gradients 
of details after the processing of smoothing filters, and most 
of them are even close to zero. Then it is possible for us 
to separate most noises and details in reflectance by treat- 
ing those gradients calculated from smoothed reflectance as 
weights. At the same time, considering that our task is to en- 
hance the image contrast, and edges with higher gradients 
often have higher contrast, so the loss functions of RED-Net 
can be designed to make the gradients of details and edges 
higher to achieve better contrast enhancement, and it is dif- 
ferent from the previous works which only try to preserve 
edges. Based on those ideas, some self-supervised loss 
functions are formulated and their effectiveness are proved 
through experiments. 


The loss function in this paper can complete self- 
supervised training, which means that we can directly solve 
image enhancement task for one specific image, whether by 
CNN(Convolutional Neural Network) or analytical meth- 
ods. However, more training data in CNN often leads to 
better results(e.g. Fig.4), and most of time CNN spends 
less processing time than analytical methods. The proposed 
method is independent of the way acquiring low light im- 
ages, and the training process is completely self-supervised, 
so the method proposed in this paper has good general- 
ization ability, even if the pre-trained network is not well 
enough in a new environment, retraining or fine-tuning it 
without building paired/unpaired normal light images data 
set is possible for the network. Our contributions can be 
summarized as: 


e We proposed a framework for enhancing low light im- 
ages, which can enhance the image contrast and reduce 
noise at the same time. Through the close coupling of 
the two, we can reduce the loss of information in im- 
age enhancement tasks (e.g. blur caused by commonly 
used post-denoising). 


e We proposed an Image Contrast Enhancement 
Network(ICE-Net) and a Re-Enhancement and De- 
noising Network(RED-Net), and both of them can be 
trained by self-supervision, which gets rid of the de- 
pendence on paired or unpaired images. Also, the 
RED-Net proposed in this paper can be combined 
with other Retinex or HSV based image enhancement 
methods to achieve re-enhancement and noise sup- 
pression, even AHE(Adaptive Histogram Equalization 
[31]) which produces heavy noises, and this is helpful 
for many previous studies on contrast enhancement. 


e We compare the proposed method with several state- 
of-the-art methods via some comprehensive experi- 
ments. And the results are measured by objective 
indexes and visual quality. All results consistently 
proved the effectiveness of the proposed method. 


2. Related works 


Low light image enhancement. Directly adjusting the 
contrast of the low light image is probably the most in- 
tuitive and easy way to realize image enhancement, such 
as Histogram Equalization(HE), and other improved meth- 
ods based on HE [30, 36, 28, 3, 19]. Although those im- 
proved methods are proposed to achieve noise suppression, 
hue preserving, brightness preserving, and so on, there are 
still many problems in directly adjusting the contrast, such 
as, over- and under- enhancement, noise amplification, et 
al. Gamma correction is another kind of mapping manner, 
which is also a frequently used method for low light image 


enhancement. Although it can promise a well image bright- 
ness, and stretch the contrast in low or high areas, it still 
can not avoid noise amplification and most of the time, its 
result highly depends on the Gamma value which is chosen 
artificially. 


Retinex is a widely used model for low light imagence- 
ment in recently years. According to Retinex theory, an 
image can be decomposed into reflectance and illumina- 
tion. The early works SSR [15] and MSR [14] treat 
the reflectance as the final enhancement result. However, 
since the decomposition is a ill-posed problem and without 
enough constraints on the reflectance, the enhanced image 
often have unreal phenomena such as the over-enhancement 
and whitening. Also, it is hard for those methods to re- 
duce noise. In recent works, illumination are enhanced after 
the decomposition and the final enhanced image is obtained 
by recombining the enhanced illumination and reflectance. 
However, the enhanced image may still have noise and an 
extra post-denoising procedure have to be preformed{[| 1, 8], 
which will produce blur in details. [34] introduced a joint 
low-light enhancement and denoising method, which can 
achieve denoising and enhancement simultaneously. [21] 
further improved the method through considering a noise 
map compared with the conventional Retinex model. Al- 
though those methods are proposed to have promising re- 
sults, most of them need multiple iterations for decompo- 
sition which will cost lots of time. Meanwhile, as there is 
not any method to automatically manipulate the illumina- 
tion, the enhanced image may not have a proper contrast 
and usually need careful parameter tuning. 


Recently, the amazing performance of deep learning also 
inspired some promising works in low light image enhance- 
ment, including supervised works [26, 37, 42, 6] and unsu- 
pervised works [13, 10]. Most of the early works based on 
supervised learning train the networks with synthetic data 
sets, such as [26, 20, 38, 35], etc. Although the data ob- 
tained by these works seems to be dark and noisy, they 
are still different from a natural one. Chen et al. [4] in- 
troduce a dataset which contains real raw low light images 
and corresponding raw high light images for training. As 
there can be lots of reference images for one input low light 
image, they introduce an amplification ratio in the network 
to achieve correspondence between the input and reference. 
This method can well solve the problem of noise and color 
distortion, however, the ratio must be chosen by user dur- 
ing test which limit the widely use of this method, and 
there maybe some over-/under-enhanced areas in the im- 
age with only one ratio. [37] introduced a dataset named 
LOL which contains real paird low and high images. And 
it introduced the Retinex model into the training process to 
connect the reflection images of the input and reference, and 
proposed to denoise on reflectance with BM3D[5]. How- 
ever, it will still cause blur or remain noisy, and it’s hard 


to find a balance between the two. [42] added a subnet 
called restoration-net to achieve denoising on reflectance, 
and provided an extra brightness ratio to control the illu- 
minantion. However, during the test, it still need to man- 
ually adjust the ratio parameter to obtain better enhance- 
ment results. Although these methods use real low light 
data for training, due to the lack of constraint on the con- 
trast of the enhanced image, it can not avoid the problem 
of over-enhancement (saturation) or under-enhancement in 
the enhanced image, even with artificial adjustment of pa- 
rameters. In the unsupervised works, [13] proposed a GAN- 
based method which can be trained with unpaired data, but 
it cannot control the enhancement results. [10] proposed a 
zero-reference low light image enhancement method, which 
can be trained without any paired or unpaired data. How- 
ever, it did not provide any noise removal methods. 

Image denoising Many denoising methods have been 
proposed over the past few decades, including conventional 
methods [5, 9] and learning based methods [40, 23, 1]. 
However, those denoising methods are not specially de- 
signed for the low light image enhancement task. No matter 
pre-/post-processing with those method will caused details 
loss, and the learning based method may even invalid for 
different kind of noise distribution. [41] proposed an de- 
noising method for low light image enhancement, however, 
it need to be trained with paired low/high light image data. 
Recently, [18] proposed a unsupervised denoising method 
named N2V, which can be trained with the noisy image 
only, however, in our tests, it will still cause blur even re- 
trained with the enhanced image. 
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Figure 2. The structure of the proposed method, RED-Net take 


low light images and the max channel of the output of ICE-Net as 
input. 


3. Proposed Model 


The proposed method aims at achieving low light im- 
age enhancement and denoising without any artificial ad- 
justment in the case of only having low light images. For 
example, when the camera enters a new low condition envi- 
ronment, the pre-trained networks may not works for differ- 
ent distribution, and the only data we can get are low light 
images. In order to achieve automatically contrast enhance- 
ment, we proposed a Maximum Entropy based Retinex 


model and an self-supervised ICE-Net to take advantage of 
multiple images. As for the denoising, we proposed an self- 
supervised RED-Net which is specially designed for the low 
lihgt image enhancement task. Through the combination of 
re-enhancement and denoising, the RED-Net can save more 
details during denoising process. The structure of the pro- 
posed method is shown in Fig. 2. 


3.1. ME-Retinex model and ICE-Net 


Recently, a lot of low-light image enhancement works 
are based on the following Retinex model: 


S=RolI (1) 


where S and I represent the captured image and the illumi- 
nation image respectively, R represents the reflectance and 
some works treat it as the desired enhanced image, o repre- 
sents the element-wise multiplication. Most of recent works 
assume that the three color channels of the image have the 
same illumination in order to simplify the model [37],[42], 
and the maximum value of the three color channels is gen- 
erally used as the initial estimate of the illumination map 
[11]. It has been proved that image enhancement methods 
based on this simplified Retinex model are equivalent to di- 
rectly control contrast on V channel in HSV color space and 
remain the H and S channels unchanged [41]. However, 
there are still some differences between those two types of 
methods. Methods which directly control the image con- 
trast usually stretch the contrast of some areas and com- 
press the contrast of other areas, and the compressed areas 
will lose details(over-/under-enhancement can be treated as 
details missing). For example, HE will merge the smaller 
bins, and Gamma correction will compress the contrast of 
bright areas, both of those will cause the loss of details. And 
methods based on Retinex usually do not contain the con- 
straints on the contrast of target enhanced image (whether 
Ror RoI ), which produce the uncertain of results. 

If we transform the HE or Gamma correction to the 
Retinex model to explain, the enhanced image R is ob- 
tained through S/I without the constraint that illumination I 
is smooth, then the missing details will be retained in the il- 
lumination I. It can be considered that, if a rich texture area 
S is divided by a smooth I, the details will be in R which 
can avoid the loss of details in HE or Gamma correction. 
Then we can combine the method of directly controlling 
the contrast with the Retinex model to take advantage of the 
both. And the combination will suppress the noise through 
the assumption that illumination is smooth, compared with 
the method of direct control contrast. 

Typically, a Retinex based method can be expressed as 
follows: 

minlreon a Ale + Aglt (2) 


Where, lrcon, Ir and ly represent reconstruction loss, re- 
flectance loss and illumination loss, respectively. A; and A2 


are weight parameters. The reconstruction loss con, can be 
expressed as: 
lreon = IIS — R o Iļ|; (3) 


Where || è ||; represents the L norm, we use the L norm 
to constrain all the losses, and do not compare the impact 
of Lı, L2, SSIM and other loss functions on low level im- 
age processing tasks, since there are already some related 
studies such as [43]. 

In this paper, we choose the HE method to form a Max- 
imum Entropy based Retinex model, then the reflectance 
loss is formulated as: 

lr = || maz R° — F( maz S°) 
cEr,g,b cEr,g,b 


+Al|VRI], (4) 
1 


where F(x) means the histogram equalization operator to 
image x. A is weight parameters, V means gradient oper- 
ator. This first term of this loss function means that maxi- 
mum channel of the reflectance should conform to the max- 
imum channel of the low light image and has the maximum 
entropy, which can be considered as directly control the 
contrast of enhanced image. The second term is a com- 
monly used smoothing term to suppress noise, but usually 
it is hard to distinguish image details and noises well. 

For the illumination loss, we adopt the structure-aware 
smoothness loss proposed in [37]: 


ly = || VI exp(—A3|VRI)||, (5) 


It is proposed that Equation 5 can make the illumina- 
tion loss aware of the image structure in [37]. And this loss 
means that the original TV function ||VI||, is weighted with 
the gradient of reflectance. 

For Equations 2-5, we introduce an ICE-Net to solve this 
optimization problem. Then there will be a question, why 
introduce CNN to do that? As the ideal image can be ob- 
tained from minimizing the total loss, then one could just 
run this optimization directly on the R and I for a single 
image S and the introduction of CNN does not seem to be 
necessary. However, most of the optimization process need 
multiple iterations which will bring time consumption prob- 
lems, and with more constraints, the solution will be more 
complicated. And at the same time, HE is a global enhance- 
ment method, which will inevitably lead to the problem of 
too bright or too dark in some local areas. By introduc- 
ing CNN and training on multiple images, this problem can 
be avoided, as shown in Fig. 4. This is because under the 
HE constraint, the same local area in different images will 
be enhanced to different degrees, and CNN will be trained 
to find the median value with L; regularization instead of 
becoming over bright or over dark. In addition, the loss 
function and ICE-Net is designed to learn how to get an ap- 
propriate enhancement in contrast and brightness, so we did 
not make any special design on denoising and the RED-Net 
designed in next sub-section can well achieve denoising. 


3.2. RED-Net 


After the processing of ICE-Net, although the contrast 
of the image has been improved, there are still some noises 
in the image. Inspired by [41] which introduce a Condi- 
tional Re-Enhancement Network(CRE-Net) to denoise for 
low light image enhancement tasks, we further propose a 
self-supervised RED-Net to re-enhance the low light im- 
age and denoise at the same time. In this part, we still 
build the loss function based on Equation 2, however, every 
sub-loss function has been modified. For the reconstruc- 
tion loss [/..,,, and reflectance loss lh in RED-Net, we both 
adopt the assumption that the noise conforms to the Pois- 
son distribution[39] which is more in line with the real low 
light image noises. In order to distinguish from the variables 
in ICE-net, we added the superscript ’ for the variables in 
RED-Net and the reconstruction loss can be expressed as: 

lecon = R' oT — S o log (R' oT’) (6) 


rcon 


where R’ and I’ represents the reflectance and illumination 
produced by RED-Net,respectively, and R’ is also target en- 
hanced image of whole proposed method. 

We argue that an image can be divided into different 
components, including noise, flat area, details and structure 
information, and there is no clear dividing line between de- 
tails and structure information. There are many methods 
to remove noise, but the key to the problem is how to sep- 
arate details and structure from noise, then to preserve or 
even strengthen those details and structure during denois- 
ing. In order to make reflectance less noise, and preserve 
rich details and sharp edges, we design the reflectance loss 
as follows: 


ec c 
R = maz R — maz R° o log( maz RR") 
cEr,g,b cEr,g,b cEr,g,b 


+A |W o N(|VR'|) o exp(—A3W o N(|YVR'IDIl 
(7) 


where N(x) and |x| represent the local normalization on x 
and absolute value of x, respectively. R and R’ represent 
the output reflectance of the ICE-Net and RED-Net, respec- 
tively. W represents weights, which can be calculated as 
follows: 

W =N (|v (G(R) (8) 


where, G(x) represents smooth filter on (The mean filter 
are used in proposed method). The graph made by the sec- 
ond term x * exp (—Ax) is shown in Fig. 3. Intuitively, after 
smoothed, there are still gradients in the details and struc- 
ture, even they are smaller than before. But the noise and 
smooth areas may have no gradients or have much smaller 
gradients. Then we can use the gradients of those smoothed 
images as the weight. As it can be seen in Fig. 3, when 
the loss function is in the form of x x exp (—Ax), small x 
will become smaller, and high x will become higher during 


training. And through the local normalization, the details 
and structure are more likely to fall on the right and make 
them more sharper during training. As shown in Fig. 5, 
noise are well removed and the details are preserved. 
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Figure 4. The results by ICE-Net with different training data. 
(a) Input. (b) The network was trained with multiple data. 
SSIM:0.6743, PSNR:23.4716, NIQE:3.9140 (c) The network was 
trained with (a) only. SSIM:0.4858, PSNR:15.5112, NIQE:4.9367 
(d) Reference 


Typically, illuminations are usually expected to retain 
only structural information and ignore detailed information. 
Therefore we can adopt a design similar to reflection loss, 
and the illumination loss can be expressed as follows: 


l =||Wr0 N (|VI'|) o ezp (—AgWr © N (VP) 
oezp (As Wn o N (|VR'|)) |l (9) 
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Figure 5. The results generated with different loss in RED-Net. (a) Input low. (b) Reference high. (c) AHE[31]. (d) AHE & RED-Net. (e) 
w/o W. (f) w/o Wr and WR . (g) w/o exp (—A3 Wr o N (|VI'|)). (h) w/o exp (—A3 Wr © N (|VR'1)). 


where Wy and Wp represents weights, which can be cal- 
culated as follows: 


Wi = N (|G (VY) |) (10) 


Wr =N (|G (VR) |) (11) 


where G(x) and N(x) still represent smooth filter and lo- 
cal normalization on x, respectively. Different from W in 
Equation 8, the order of gradient operation V and smooth- 
ing operation G are switched. It can be considered that, 
for noise and details, the mean value of the gradient in lo- 
cal area should be small, which is quite different for the 
structure. For example, texts in a white paper may have 
opposite gradients in a local area, which makes the mean 
gradient close to zero. Then with the Wy and the special 
design loss form x x exp (—A2), we can separate the noise 
and details from the structure in illumination, and make the 
structure edge sharper during training. Also we preserve 
exp (—A3WR o N (|VR’'|)) and introduce the weight WR 
to ensure the consistency of the structural information of the 
reflectance and the illumination. It should be noted that all 
weight items W, Wr, Wy, do not participate in the back 
propagation process during training. 


4. Experiments 


We use the LOL database [37] which contains 500 
low/normal light image pairs, 485 of which are used for 
training and each image size is 400 * 600. Note that during 
the training process, we only use natural low light images 
without any synthetic data and normal light images. During 
the training process, our batch size is set to 16 and the patch 
size is set to 48 * 48. We use Adam stochastic optimization 


[17] to train the network and the update rate is set to 0.001. 
The training and testing of the network are completed on a 
Nvidia GTX 2080Ti GPU and Inter Core i9-9900K CPU, 
and the code is based on the tensorflow framework. 

To evaluate the performance of the proposed method on 
enhancing low-light images, we quantitatively and visually 
compare our method with some low light image enhance- 
ment methods, including LIME[1!1], RRM [21], Retinex- 
Net[37], KinD [42], and also we collected some data from 
other data sets for testing. 

Three metrics are adopted for quantitative comparison, 
which are Peak Signal-to-Noise Ratio(PSNR), Structural 
SIMilarity(SSIM) [44], and NIQE [27]. NIQE is a non- 
reference image quality assessment method, which can 
evaluate the naturalness of the image and a lower value indi- 
cates better quality. While, PSNR and SSIM are referenced 
image quality assessment methods, which indicate the noise 
level and the structure similarity between the result and the 
reference, respectively. 


4.1. Ablation Study 


In this part, to prove the necessity of introducing the 
CNN and the effectiveness of each component of the pro- 
posed method, we have made two ablation studies. 

Contribution of ICE-Net. This ablation study is to an- 
swer the question that why not just optimize the loss func- 
tion to get the result, like other variational based Retinex 
models[16, 29], if the network can be trained in a self- 
supervised way. As mentioned in Sec. 3.1, the CNN based 
method ICE-Net is introduced in our proposed method to 
avoid the problems caused by HE through training with 
multiple data. Considering it is difficult to directly solve 
Equation 2 through variational methods under proposed loss 
functions in this paper, we use a CNN trained with only a 
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single low light image instead, and the result can be consid- 
ered as a solution to the Equation 2. 

In Fig. 4, we present the results of our ICE-Net trained 
with one single low light image and with multiple images, 
and the results show that training with multiple data has a 
better enhancement effect in contrast and brightness. It can 
be seen in Fig.4(c), optimization on a single low light im- 
age cannot avoid the the problems of under-enhancement 
or over-enhancement (e.g. green pipes and metal hinges), 
which is caused by HE. However, in Fig. 4(b), training with 
multiple images, every local area of the enhanced image 
have a more proper brightness. Also it can be seen through 
objective indexes, training with multiple data shows better 
results in PSNR, SSIM and NIQE, which means that the en- 
hanced result with multiple training data has less noise and 
seems more like reference and is more in line with natural 
images. 


(d) 
Figure 6. Visual comparison with different loss in RED- 


Net.(a)-(b) and (c)-(d) are training with or without 
exp(—A3Wy0 N (|VI'|)), respectively. Please zoom in to 
see the details 


Contribution of Each loss in RED-Net. We present the 
results of RED-Net trained by different losses and weights 
in Fig. 5. In order to better illustrate the importance 
of each part of the loss function, we take the AHE[31] 
which produces serious noise during processing as the con- 
trast enhancement method(e.g. Fig. 5(c)), and study the 
re-enhancement and denoising effects under different loss 
functions. We use the complete loss function as the base- 
line (Completely contains Equation 6 to 11), and study the 
influence of removing different weight terms, including 


e without W in Equation 7, (Fig. 5 (e)) 
e without Wy and WR in Equation 9, (Fig. 5 (f)) 


e without exp(—A3Wr10 N (|VI'|)) in Equation 9, 
(Fig. 5 (g)) 


e without exp(—A3WpR o N (|VR’|)) in Equation 9, 
(Fig. 5(h)) 


As shown in Fig.5 (d), with all the proposed loss func- 
tions, the RED-Net can obviously reduce noise and at the 
same time preserve details. When we simply remove W 
(e.g. Fig.5 (e)), only the obvious struture are preserved, 
that proves the importance and effectiveness of separating 
the noise and details through W. And when we remove 
exp (—A3WR o |VR’) (e.g. Fig. 5 (h)), the details are lost 
and some obvious edges are slightly blurred. And when we 
remove Wy and WR which are designed to smooth noise 
and details and preserve structure in illumination, some de- 
tails in reflectance are blurred(e.g. Fig.5 (f)), which proves 
the importance of smoothing in the part of noises and details 
in illumination, and also proves the effectiveness of our de- 
sign. 

In Fig. 5 (g) and (d), it seems that the third kind of loss 
in this ablation study which removes exp (—A3 Wy o |VI'|) 
in illumination loss does not affect the result of reflectance. 
However, it can be seen in Fig.6 (b) and (d), the edges (e.g. 
edge in red rectangles) in illumination are blurred under 
the third kind of loss, which may caused halo effect in re- 
flectance, and a well illumination can help a lot in future 
work too(e.g. avoiding over-enhancement). We also study 
the case that Poisson distribution is not used. However, with 
AHE [31], the output of the RED-Net is totally unaccept- 
able, and even the structural cannot be saved. 


4.2. Comparison with State-of-the-Arts 


In this subsection we compare the performance of 
the proposed method with current state-of-the-art methods 
through qualitative and quantitative experiments. And dur- 
ing these experiments, we used not only the LOL dataset, 
but also some standard datasets collected from previous 
works, including LIME[1!1](10 images), MF(10 images), 
and VV ! (23 images). 

We have compared the combination of ICE-Net and 
RED-Net with some previous methods which can achieve 
contrast enhancement and denoising, including LIME[! 1] 
which has a denoising post-processing, RRM[22] which 
can jointly enhance contrast and denoise, Retinex-Net[37] 
which is trained through supervised ways and denoise with 
BM3D[5] in reflectance, KinD[42] which is trained through 
supervised ways in contrast enhancement and denoising, 
and the code is download from the author’s homepage and 
parameters are set as recommended in those paper. The re- 
sults are shown in Fig. 7 and Table | and 2. 

Fig.7 shows the qualitative evaluation results, it can 
be seen that, compared with the LIME and Retinex-net 
which denoise on the reflectance with BM3D, the two-stage 


'https://sites.google.com/site/vonikakis/datasets/challenging-dataset- 
for-enhancement 
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Figure 7. Visual comparison with other state-of-the-art methods, each row comes from a different data set and each column comes from a 
different method. From left to right are Input, KinD[42], LIME[1 1], Retinex-Net[37], RRM[22],and proposed method in this paper. Please 


zoom in to see the details 


method proposed in this paper which first enhances, and 
then re-enhances and denoises can keep a better balance 
between denoising and detail preserving, and the method 
is even comparable to the supervised method KinD (e.g. 
Books in the bookcase processed by LIME are blurred and 
processed by Retinex-Net still have serious noise, and both 
our method and KinD can well preserve the texts in the 
book). Also it can be seen in the second and last row of Fig. 
7, our method is able to work under serious noise and non 
uniform illumination conditions(e. g. the face and arm in the 
shadow state are well enhanced). At the same time, since 
we assume that the difference between detail and noise is 
that noise is distributed independently, which is not always 
right, the rough wall was smoothed in the blue rectangle 
in the third row. (More detailed experiments, comparisons, 
network structure and parameters are included in the sup- 
plementary material. In our experiments, the impact of net- 
work structure is not significant, and RED-Net has the same 
network structure as in [41], and the ICE-Net is similar to 
RED-Net, but has less layers than RED-Net). 

Table | and 2 show the quantitative evaluation results, it 
can be seen that, our method gets poor NIQE, and highest 
PSNR and middle SSIM, which means that after finally en- 
hanced, the image processed by our method seems different 
from the natural image and reference, and noises in the im- 
ages are well removed. This is due to that, during designing 
the ICE-Net and RED-Net, we mainly consider the auto- 
matic adaptation ability of the algorithm to the new environ- 
ment and the removal of noise, and do not take any natural 
image prior into loss functions, especially in the RED-Net. 


Table 1. NIQE scores on the each subset(LOL [37], LIME [11], 
MF [7], VV), and smaller NIQE indicate more in line with natural 


images. 
Dataset LIME[!!] RRM[22] Retinex-Net[37] KinD[42] Proposed 
LIME [11] 4.08 4.03 4.37 3.59 5.07 
LOL [37] 3.95 3.95 9.06 3.89 4.33 
MF [7] 3.44 3.68 3.88 3.31 4.59 
VV 3.22 3.31 3.57 2.91 4.01 


Table 2. SSIM amd PSNR scores on the LOL [37] data set, and 
higher SSIM and PSNR indicate more in line with reference and 
less noise, respectively. 

Dataset LIME[1!] RRM[22] Retinex-Net[37] KinD[42] Proposed 


PSNR 17.22 13.88 16.82 17.64 18.34 
SSIM 0.60 0.66 0.57 0.76 0.65 


And at the same time, in the process of enhancement, our 
goal is to enhance the details of each local area, which is 
quite different from the reference image obtained by adjust- 
ing the exposure time. 


5. Conclusion and Future Work 


In this paper, aiming at automatically enhancing the low 
light images and denoising, we create a two-stage frame- 
work which enhances the image contrast first and then fur- 
ther re-enhances and denoises. And both of the networks 
in our method can be trained with a self-supervised way, 
which means that the proposed method can be used in real 
new unfamiliar environment and new device. The experi- 
mental results on various low light data sets show that our 
method is comparable with many state-of-the-arts methods 


on both visual effect and subjective metrics. 


Our future 


works will explore how to restore the color degradation, 
how to combine the RED-Net and the ICE-Net, and how 
to the combine the low light image enhancement and high- 
level tasks to further improve real-time performance. 
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